{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep MNIST for Experts\n",
    "\n",
    "## Load MNIST Data\n",
    "\n",
    "If you are copying and pasting in the code from this tutorial, start here with these two lines of code which will download and read in the data automatically:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting MNIST_data/train-images-idx3-ubyte.gz\n",
      "Extracting MNIST_data/train-labels-idx1-ubyte.gz\n",
      "Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n",
      "Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "mnist = input_data.read_data_sets('MNIST_data', one_hot=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Start TensorFlow InteractiveSession"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "sess = tf.InteractiveSession()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build a Softmax Regression Model\n",
    "\n",
    "In this section we will build a softmax regression model with a single linear layer. In the next section, we will extend this to the case of softmax regression with a multilayer convolutional network.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = tf.placeholder(tf.float32, shape=[None, 784])\n",
    "y_ = tf.placeholder(tf.float32, shape=[None, 10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Variables\n",
    "We now define weights **W** and **b** of our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "W = tf.Variable(tf.zeros([784,10]))\n",
    "b = tf.Variable(tf.zeros([10]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sess.run(tf.global_variables_initializer())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y = tf.matmul(x,W) + b\n",
    "cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)\n",
    "\n",
    "for _ in range(1000):\n",
    "    batch = mnist.train.next_batch(100)\n",
    "    train_step.run(feed_dict={x: batch[0], y_: batch[1]})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9168\n"
     ]
    }
   ],
   "source": [
    "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
    "print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Improving\n",
    "91% accuracy is not too good. Now, we will try to add some convolutional neural network to improve accuracy to 99.2%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def weight_variable(shape):\n",
    "  initial = tf.truncated_normal(shape, stddev=0.1)\n",
    "  return tf.Variable(initial)\n",
    "\n",
    "def bias_variable(shape):\n",
    "  initial = tf.constant(0.1, shape=shape)\n",
    "  return tf.Variable(initial)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolution and Pooling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def conv2d(x, W):\n",
    "    return tf.nn.conv2d(x, W, strides=[1,1,1,1],padding='SAME')\n",
    "\n",
    "def max_pool_2x2(x):\n",
    "    return tf.nn.max_pool(x, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First Convolutional Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "W_conv1 = weight_variable([5, 5, 1, 32])\n",
    "b_conv1 = bias_variable([32])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x_image = tf.reshape(x, [-1,28,28,1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)\n",
    "h_pool1 = max_pool_2x2(h_conv1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Second Convolutional Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "W_conv2 = weight_variable([5, 5, 32, 64])\n",
    "b_conv2 = bias_variable([64])\n",
    "\n",
    "h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)\n",
    "h_pool2 = max_pool_2x2(h_conv2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Densely Connected Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "W_fc1 = weight_variable([7 * 7 * 64, 1024])\n",
    "b_fc1 = bias_variable([1024])\n",
    "\n",
    "h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])\n",
    "h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dropout\n",
    "To reduce overfitting, we will apply dropout before the readout layer. We create a placeholder for the probability that a neuron's output is kept during dropout. This allows us to turn dropout on during training, and turn it off during testing. TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling.\n",
    "\n",
    "For this small convolutional network, performance is actually nearly identical with and without dropout. Dropout is often very effective at reducing overfitting, but it is most useful when training very large neural networks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "keep_prob = tf.placeholder(tf.float32)\n",
    "h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Readout Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "W_fc2 = weight_variable([1024, 10])\n",
    "b_fc2 = bias_variable([10])\n",
    "\n",
    "y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and Evaluate the Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "step 0, training accuracy 0.06\n",
      "step 100, training accuracy 0.86\n",
      "step 200, training accuracy 0.9\n",
      "step 300, training accuracy 0.98\n",
      "step 400, training accuracy 0.92\n",
      "step 500, training accuracy 0.96\n",
      "step 600, training accuracy 0.94\n",
      "step 700, training accuracy 0.98\n",
      "step 800, training accuracy 0.98\n",
      "step 900, training accuracy 0.98\n",
      "step 1000, training accuracy 0.96\n",
      "step 1100, training accuracy 0.92\n",
      "step 1200, training accuracy 0.98\n",
      "step 1300, training accuracy 0.94\n",
      "step 1400, training accuracy 0.96\n",
      "step 1500, training accuracy 0.94\n",
      "step 1600, training accuracy 0.98\n",
      "step 1700, training accuracy 0.96\n",
      "step 1800, training accuracy 1\n",
      "step 1900, training accuracy 0.98\n",
      "step 2000, training accuracy 0.96\n",
      "step 2100, training accuracy 0.98\n",
      "step 2200, training accuracy 1\n",
      "step 2300, training accuracy 0.96\n",
      "step 2400, training accuracy 0.98\n",
      "step 2500, training accuracy 1\n",
      "step 2600, training accuracy 0.98\n",
      "step 2700, training accuracy 1\n",
      "step 2800, training accuracy 0.96\n",
      "step 2900, training accuracy 1\n",
      "step 3000, training accuracy 0.94\n",
      "step 3100, training accuracy 0.98\n",
      "step 3200, training accuracy 1\n",
      "step 3300, training accuracy 1\n",
      "step 3400, training accuracy 1\n",
      "step 3500, training accuracy 0.98\n",
      "step 3600, training accuracy 0.98\n",
      "step 3700, training accuracy 0.98\n",
      "step 3800, training accuracy 1\n",
      "step 3900, training accuracy 1\n",
      "step 4000, training accuracy 0.98\n",
      "step 4100, training accuracy 0.98\n",
      "step 4200, training accuracy 1\n",
      "step 4300, training accuracy 1\n",
      "step 4400, training accuracy 1\n",
      "step 4500, training accuracy 0.98\n",
      "step 4600, training accuracy 0.98\n",
      "step 4700, training accuracy 0.98\n",
      "step 4800, training accuracy 1\n",
      "step 4900, training accuracy 0.98\n",
      "step 5000, training accuracy 1\n",
      "step 5100, training accuracy 1\n",
      "step 5200, training accuracy 1\n",
      "step 5300, training accuracy 1\n",
      "step 5400, training accuracy 1\n",
      "step 5500, training accuracy 1\n",
      "step 5600, training accuracy 1\n",
      "step 5700, training accuracy 1\n",
      "step 5800, training accuracy 1\n",
      "step 5900, training accuracy 0.98\n",
      "step 6000, training accuracy 0.98\n",
      "step 6100, training accuracy 0.98\n",
      "step 6200, training accuracy 0.98\n",
      "step 6300, training accuracy 1\n",
      "step 6400, training accuracy 1\n",
      "step 6500, training accuracy 1\n",
      "step 6600, training accuracy 0.98\n",
      "step 6700, training accuracy 0.98\n",
      "step 6800, training accuracy 1\n",
      "step 6900, training accuracy 1\n",
      "step 7000, training accuracy 1\n",
      "step 7100, training accuracy 0.98\n",
      "step 7200, training accuracy 0.98\n"
     ]
    }
   ],
   "source": [
    "cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
    "train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n",
    "correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))\n",
    "accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))\n",
    "\n",
    "sess.run(tf.global_variables_initializer())\n",
    "\n",
    "# By default, the Saver handles every Variables related to the default graph\n",
    "all_saver = tf.train.Saver() \n",
    "\n",
    "for i in range (20000):\n",
    "    batch = mnist.train.next_batch(50)\n",
    "    if i % 100 == 0:\n",
    "        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})\n",
    "        print(\"step %d, training accuracy %g\"%(i, train_accuracy))\n",
    "        \n",
    "    train_step.run(feed_dict={x: batch[0], y_:batch[1], keep_prob:0.5})\n",
    "\n",
    "all_saver.save(sess, \"model\")\n",
    "print(\"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reference\n",
    "* https://www.tensorflow.org/get_started/mnist/pros"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
